Handling large databases in data mining

نویسنده

  • M. Mehdi Owrang O.
چکیده

M. Mehdi Owrang O. American University, Dept of Computer Science & IS, Washington DC 20016 [email protected] ABSTRACT Current database technology involves processing a large volume of data in order to discover new knowledge. The high volume of data makes discovery process computationally expensive. In addition, real-world databases tend to be incomplete, redundant, and inconsistent that could lead to discovering redundant and inconsistent knowledge. We propose to use domain knowledge to reduce the size of the database being considered for discovery and to optimize the hypothesis (representing the pattern to be discovered) by eliminating implied, unnecessary, and redundant conditions from the hypothesis. The benefits can be greater efficiency and the discovery of more meaningful, non-redundant, non-trivial, and consistent rules.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متن‌کاوی در حوزه یادگیری الکترونیکی

As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...

متن کامل

Scalable Model-based Clustering Algorithms for Large Databases and Their Applications

With the unabated growth of data amassed from business, scientific and engineering disciplines, cluster analysis and other data mining functionalities, play a more and more important role. They can reveal previously unknown and potentially useful patterns and relations in large databases. One of the most significant challenges in data mining is scalability — effectively handling large databases...

متن کامل

Data Mining and Knowledge Discovery in Molecular Databases - Session Introduction

The development and growth of molecular databases over the last decade has brought a growing problem to the biocomputing community. Our ability t o analyze, summarize and extract information from these databases has lagged far behind our ability to collect and store data. As well, traditional methods for handling data either automated or manual cannot be eeectively applied because of the volume...

متن کامل

Generalization and Decision Tree Induction: Efficient Classification in Data Mining

Efficiency and scalability are fundamental issues concerning data mining in large databases. Although classification has been studied extensively, few of the known methods take serious consideration of efficient induction in large databases and the analysis of data at multiple abstraction levels. This paper addresses the efficiency and scalability issues by proposing a data classification metho...

متن کامل

Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Framework

With the rapid growth of information technology and in many business applications, mining frequent patterns and finding associations among them requires handling large and distributed databases. As FP-tree considered being the best compact data structure to hold the data patterns in memory there has been efforts to make it parallel and distributed to handle large databases. However, it incurs l...

متن کامل

A Proposed Data Mining Methodology and its Application to Industrial Procedures

Data mining is the process of discovering correlations, patterns, trends or relationships by searching through a large amount of data stored in repositories, corporate databases, and data warehouses. Industrial procedures with the help of engineers, managers, and other specialists, comprise a broad field and have many tools and techniques in their problem-solving arsenal. The purpose of this st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000